Project-Team:Maxplus

Inria | Raweb 2014 | Presentation of the Project-Team Maxplus | Maxplus Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Algorithmes/Algorithms

Itération sur les politiques pour le contrôle stochastique et les jeux répétés à somme nulle/Policy iterations for stochastic control and repeated zero sum games

Participants : Marianne Akian, Stéphane Gaubert.

L'algorithme d'itération sur les politiques est bien connu pour résoudre efficacement les équations de la programmation dynamique associées à des problèmes de contrôle stochastique avec critère à horizon infini (Howard) ou ergodique (Howard, et Denardo et Fox). Il a aussi été développé dans le cas de jeux à deux joueurs et somme nulle actualisés (Denardo) ou ergodiques (Hoffman et Karp).

Des résultats récents de Ye ainsi que Hansen, Miltersen et Zwick montrent que l'algorithme d'itération sur les politiques, restreint à la classe des jeux à somme nulle (à 1 ou 2 joueurs) actualisés de facteur d'actualisation donné, est fortement polynomial. Dans [58] , on montre que ceci est le cas aussi pour l'algorithme d'itération sur les politiques pour les jeux à somme nulle et paiement moyen, restreint à la classe des jeux qui ont un temps moyen de retour ou d'arrivée à un état donné borné. La preuve utilise des techniques de théorie de Perron-Frobenius non-linéaire, permettant de ramener le problème à paiement moyen à un problème actualisé (de facteur d'actualisation dépendant de l'état et des actions). La même technique permet aussi de traiter le cas de jeux à somme nulle actualisés dont le facteur d'actualisation peut dépendre de l'état et des actions et prendre éventuellement des valeurs supérieures à 1. Récemment, on a montré que la borne pour le cas des jeux à somme nulle et paiement moyen s'applique aussi au cas des jeux actualisés de facteur d'actualisation constant [31] , [32] , [45] . Ce dernier résultat est inspiré par des résultats récents de Post et Ye et de Scherrer concernant les algorithmes du simplexe et d'itération sur les politiques pour les problèmes de contrôle optimal (ou jeux à 1 joueur).

English version

Policy iteration is a powerful and well known algorithm to solve the dynamic programming equation associated to stochatic control (one player game) problems with infinite horizon criterion (Howard) or ergodic criterion (Howard and Denardo and Fox). It has also be developped in the case of zero-sum two player games, either in discounted case (Denardo) or the ergodic one (Hoffman et Karp).

Recent results of Ye and Hansen, Miltersen and Zwick show that policy iteration for one or two player (perfect information) zero-sum stochastic games, restricted to instances with a fixed discount rate, is strongly polynomial. In [58] , we show that policy iteration for mean-payoff zero-sum stochastic games is also strongly polynomial when restricted to instances with bounded first mean return time to a given state. The proof is based on methods of nonlinear Perron-Frobenius theory, allowing us to reduce the mean-payoff problem to a discounted problem with state dependent discount rate. Our analysis also shows that policy iteration remains strongly polynomial for discounted problems in which the discount rate can be state dependent (and even negative) at certain states, provided that the spectral radii of the nonnegative matrices associated to all strategies are bounded from above by a fixed constant strictly less than 1. Recently, we have proved that the bound for the case of mean-payoff zero-sum stochastic two-player games also holds for discounted games with a constant discount factor [31] , [32] , [45] . The latter result was inspired by recent results of Post and Ye, and Scherrer, concerning simplex and policy iteration algorithms for Markov decision processes (1 player games).

Algorithmique des polyèdres tropicaux/Algorithmics of tropical polyhedra

Participants : Xavier Allamigeon, Pascal Benchimol, Stéphane Gaubert, Eric Goubault [CEA] , Michael Joswig [TU Berlin] .

X. Allamigeon, S. Gaubert, et E. Goubault, ont développé dans [70] , [72] plusieurs algorithmes permettant de manipuler des polyèdres tropicaux. Ceux-ci correspondent aux travaux décrits dans § 6.3.1 . Ils permettent notamment de déterminer les sommets et rayons extrêmes d'un polyèdre tropical défini comme intersection de demi-espaces, ou inversement, de calculer une représentation externe à partir d'un ensemble de générateurs. Ces algorithmes sont implémentés la bibliothèque TPLib (voir § 5.3 ).

Dans un travail de X. Allamigeon, P. Benchimol, S. Gaubert et M. Joswig [51] , nous avons défini un analogue tropical de l'algorithme du simplexe qui permet de résoudre les problèmes de programmation linéaire tropicale, i.e.

\begin{matrix} \begin{matrix} minimiser & max_{1 \leq j \leq n} c_{j} + x_{j} \\ sous les contraintes & max (max_{1 \leq j \leq n} (a_{i j}^{+} + x_{j}), b_{i}^{+}) \geq max (max_{1 \leq j \leq n} (a_{i j}^{-} + x_{j}), b_{i}^{-}), i = 1, \dots, m \\ x \in {(ℝ \cup {- \infty})}^{n} \end{matrix} \end{matrix}

(13)

où les entrées du programme $a_{i j}^{\pm}$ , $b_{i}^{\pm}$ , $c_{j}$ sont à valeur dans $ℝ \cup {- \infty}$ . Ces problèmes sont intimement liés à la résolution de jeux répétés à somme nulle, puisque résoudre un jeux à paiement moyen déterministe est équivalent à déterminer si un problème de programmation linéaire admet un point réalisable [59] .

Comme son homologue usuel, le simplexe tropical pivote entre des points de base (tropicaux), jusqu'à atteindre l'optimum du programme linéaire. La différence fondamentale avec l'algorithme du simplexe classique est que le pivotage est réalisé de manière purement combinatoire, en s'appuyant sur des descriptions locales du polyèdre tropical défini par les contraintes à l'aide d'(hyper)graphes orientés. Ceci nous a permis de prouver que l'étape de pivotage (incluant le calcul des coûts réduits) a la même complexité en temps que dans l'algorithme classique, i.e. $O (n (m + n))$ . Ceci est d'autant plus inattendu que la structure des arêtes tropicales entre deux points de base sont géométriquement plus complexes (elles sont constituées de plusieurs segments de droite, jusqu'à $n$ ).

Le simplexe tropical a la propriété d'être fortement corrélé avec l'algorithme du simplexe classique. Grâce au principe de Tarski, le simplexe usuel peut être transposé tel quel sur des programmes linéaires dont les coefficients en entrée sont non plus des réels, mais sur le corps $ℝ {{t}}$ des séries de Puiseux généralisées en une certaine indéterminée $t$ , i.e. des objets de la forme :

c_{α_{1}} t^{α_{1}} + c_{α_{2}} t^{α_{2}} + \dots

(14)

où les $α_{i}$ sont des réels, les coefficients $c_{α_{i}}$ sont des réels non-nuls, et où la séquence des $α_{1}, α_{2}, \dots$ est strictement croissante et soit finie, soit non-bornée. L'opposé du plus petit exposant de la série, $- α_{1}$ , est appelé valuation de la série. Un programme linéaire tropical est dit relevé en un problème linéaire sur $ℝ {{t}}$ , si la valuation des coefficients en entrée de ce dernier sont égaux aux coefficients du problème tropical. Dans nos travaux, nous avons établi la correspondance suivante entre le simplexe usuel et le simplexe tropical : pour tout programme linéaire tropical générique, l'algorithme du simplexe tropical trace l'image par la valuation du chemin sur l'algorithme du simplexe usuel sur n'importe quel relèvement du programme tropical dans $ℝ {{t}}$ .

Les résultats présentés ci-dessus sont rassemblés dans l'article [51] . Ils ont fait l'objet de plusieurs présentations en conférence [67] , [68] [27] .

Ces résultats ouvrent la possibilité de relier la complexité du l'algorithme du simplexe usuel avec celles des jeux déterministes. Pour ces derniers, on sait seulement que leur résolution est dans la classe de complexité $𝖭𝖯 \cap 𝖼𝗈𝖭𝖯$ , et on ignore s'il existe un algorithme de complexité polynomiale. De façon similaire, on ne sait pas caractériser de façon précise la complexité de l'algorithme du simplexe usuel. Celle-ci dépend fortement de la règle de pivotage utilisée, et il existe des problèmes sur lesquelles de nombreuses règles de pivotage ont une complexité exponentielle. L'existence d'une règle de pivotage qui permettrait au simplexe de terminer en temps polynomial sur n'importe quelle instance est encore aujourd'hui une question ouverte.

Dans un deuxième travail, nous avons relié les deux problèmes ouverts précédents, grâce à l'algorithme du simplexe tropical. Nous avons en effet exhibé une classe de règles de pivotage, dites combinatoires, et avons montré qu'elles satisfont la propriété suivante : s'il existe une règle de pivotage combinatoire qui permet de résoudre tout problème de programmation linéaire usuel en temps polynomial, alors on peut résoudre les jeux à paiement moyen en temps (fortement) polynomial. Le terme combinatoire fait référence au fait que la règle est définie en fonction du signe des mineurs de la matrice des coefficients du problème linéaire. Ce résultat est décrit dans l'article [49] , et a été présenté dans plusieurs conférences [39] , [40] .

Enfin, dans un travail de X. Allamigeon, P. Benchimol et S. Gaubert [26] , nous avons étendu les résultats aux règles de pivotage semi-algébriques, classe incluant la règle dite du shadow-vertex. Celle-ci est connue pour avoir fourni plusieurs bornes de complexité moyenne et lisse sur l'algorithme du simplexe. Nous avons donc tropicalisé l'algorithme du simplexe shadow-vertex, et nous avons montré que cet algorithme permet de résoudre les jeux à paiement moyen en temps polynomial en moyenne.

English version

X. Allamigeon, S. Gaubert, and E. Goubault, have developed in [70] , [72] algorithms allowing one to manipulate tropical polyhedra. They correspond to the contributions described in § 6.3.1 . In particular, they can be used to determine the vertices and extreme rays of a tropical polyhedron defined as the intersection of half-spaces, or inversely, to compute an external description from a set of generators. These algorithms are implemented in the library TPLib (see § 5.3 ).

In an ongoing work of X. Allamigeon, P. Benchimol, S. Gaubert and M. Joswig, we introduced a tropical analogue of the simplex algorithm, allowing one to solve problems of tropical linear programming, which are of the form (13 ), where the coefficients of the program, $a_{i j}^{\pm}$ , $b_{i}^{\pm}$ , $c_{j}$ take their values in the max-plus semiring $ℝ \cup {- \infty}$ . These problems are closely related to mean payoff games, as solving a game of this kind is equivalent to determine whether a tropical linear program admits a feasible point [59] .

Like the classical simplex algorithm, the tropical simplex algorithm performs pivoting operations between basis points, until it reaches the optimum. The main discrepancy with the classical algorithm is that the pivoting is now a purely combinatorial operation, which is performed by using a local description of the polyhedron by a directed hypergraph. This allowed us to show that a tropical pivoting step (including computing reduced costs) has the same complexity as in the classical simplex algorithm, i.e. $O (n (m + n))$ . This is all the more surprising as the tropical edge between two given points has a geometrically more complex structure in the tropical case (it is constituted of up to $n$ ordinary line segments).

The tropical simplex algorithm turns out to be closely related to the classical one. Thanks to Tarski's principle, the latter is also valid for linear programs over the field $ℝ {{t}}$ of generalized Puiseux series in an indeterminate $t$ . These series are of the form (14 ), where the $α_{i}$ are real numbers, the coefficients $c_{α_{i}}$ are non-zero reals, and the sequence $α_{1}, α_{2}, \dots$ is strictly increasing and either finite or unbounded. The opposite of the smallest exponent of the series, $- α_{1}$ , is called valuation. A tropical linear program is said to be lifted to a linear program over $ℝ {{t}}$ if the valuation of the coefficients of the latter are sent to the coefficients of the former by the valuation. We showed the following relation between the classical simplex algorithm and its tropical analogue: for all generic tropical linear program, the tropical simplex algorithm computes the image by the valuation of the path of the classical simplex algorithm, applied to any lift in $ℝ {{t}}$ of the original program.

These results are gathered in the article [51] . They have been presented in several conferences [67] , [68] [27] .

They allow one to relate the complexity of the classical simplex algorithm with the complexity of mean payoff games. The latter is unsettled, these games are known to be in the class $𝖭𝖯 \cap 𝖼𝗈𝖭𝖯$ but it is not known whether they can be solved in polynomial time. Basic complexity issues regarding the classical simplex algorithm are also unsettled: its execution time depends on the pivoting rule, and many pivoting rules have been shown to have exponential worst case behaviors. The existence of a pivoting rule leading the simplex to terminate in polynomial time is still an open question. . In a second work, we related these two open questions, via the tropical simplex algorithm. We identified a class of pivoting rules, which are said to be combinatorial, and show that they have the following property: if there is a combinatorial pivoting rule allowing one to solve every classical linear programming problem in polynomial time, then, mean payoff games can be solved in (strongly) polynomial time. By combinatorial, we mean that the rule depends only of the coefficients of the system through the signs of minors of the coefficients matrix. This result is given in the article [49] . It has been presented to the conferences [39] , [40] .

Finally, in a work of X. Allamigeon, P. Benchimol and S. Gaubert [26] , we extended the latter results to semi-algebraic pivoting rules, which include the so-called shadow-vertex rule. This rule has been exploited in the literature to establish several average-case and smooth complexity bounds on the simplex algorithm. We tropicalized the shadow-vertex simplex algorithm, and showed that it solves mean payoff games in polynomial time on average.

Problèmes d'accessibilité dans les hypergraphes orientés et leur complexité/Reachability problems in directed hypergraphs and their complexity

Participant : Xavier Allamigeon.

Les hypergraphes orientés sont une généralisation des graphes orientés, dans lesquelles chaque arc relie un ensemble de sommets à un autre. Ils jouent un rôle important dans les travaux récents sur la convexité tropicale (voir § 6.3.1 ), puisqu'ils offrent une représentation naturelle des cônes définis sur le sous-semi-anneau booléen $𝔹 = {- \infty, 0}$ .

Dans un travail de X. Allamigeon [66] , on étudie la complexité de problèmes d'accessibilité sur les hypergraphes orientés. Nous introduisons un algorithme de complexité presque linéaire permettant de déterminer les composantes fortement connexes terminales (qui n'accèdent à aucune autre composante si ce n'est elles-mêmes) d'un hypergraphe.

Nous établissons également une borne inférieure sur-linéaire sur la taille de la réduction transitive de la relation d'accessibilité dans les hypergraphes. Cela indique que la relation d'accessibilité dans les hypergraphes orientés est combinatoirement plus complexe que celle des graphes orientés. Cela suggère aussi que des problèmes comme le calcul des composantes fortement connexes est plus difficile sur les hypergraphes que sur les graphes. Nous mettons d'ailleurs en évidence une réduction en temps linéaire du problème du calcul des ensembles minimaux dans une famille d'ensembles donnée, vers le problème du calcul de toutes les composantes fortement connexes d'un hypergraphe. Le problème du calcul des ensembles minimaux a été largement étudié dans la littérature [166] , [185] , [184] , [167] , [168] , [169] , [115] , [80] , et aucune algorithme en temps linéaire n'est connu à ce jour.

English version

Directed hypergraphs are a generalization of directed graphs, in which the tail and the head of the arcs are sets of vertices. It appears that they play an important role in the recent works on tropical convexity (see § 6.3.1 ), since they offer a natural representation of cones defined over the boolean sub-semiring $𝔹 = {- \infty, 0}$ .

In a work of X. Allamigeon [66] , we study the complexity of reachability problems on directed hypergraphs. We introduce an almost linear-time algorithm allowing to determine the terminal strongly connected components (a component is said to be terminal when no other component is reachable from it).

We also establish a super-linear lower bound over the size of the transitive reduction of the reachability relation in directed hypergraphs. This indicates that the reachability relation is combinatorially more complex in directed hypergraphs than in directed graphs. This also suggests that reachability problems such as computing all strongly connected components are likely to be harder in hypergraphs than in graphs. Besides, we show that the minimal set problem can be reduced in linear time to the problem of computing all strongly connected components in hypergraphs. The former problem consists in finding all minimal sets among a given family of sets. It has been well studied in the literature [166] , [185] , [184] , [167] , [168] , [169] , [115] , [80] , and no linear time algorithm is known.

Approximation max-plus de fonctions valeurs et équations de Riccati généralisées/Max-plus approximation of value functions and generalized Riccati equations

Participants : Stéphane Gaubert, Zheng Qu.

Les méthodes d'approximation max-plus conduisent à approcher la fonction valeur d'un problème de contrôle ou de jeux par un supremum d'un nombre fini de formes quadratiques, voir notamment [126] . On s'intéresse ici à l'analyse théorique (complexité) ainsi qu'à l'amélioration de ces méthodes. Dans certains cas, ces formes quadratiques sont propagées par des flots d'équations de Riccati généralisées. Afin d'effectuer des analyses d'erreur, on exploite les propriétés de contraction du flot de Riccati pour certaines métriques connues sur le cône des matrices positives, et en particulier pour la métrique de Thompson. Celle-ci n'est rien d'autre que $d_{T} (A, B) = {∥ log spec (A^{- 1} B) ∥}_{\infty}$ , où $spec$ désigne la suite des valeurs propres d'une matrice, et $log$ s'entend composante par composante.

Ceci nous a amené à étudier le problème général du calcul du taux de contraction d'un flot monotone sur un cône, pour la métrique de Thompson. En effet, les propriétés de contraction de l'équation de Riccati standard sont connues (résultats de Bougerol pour la métrique Riemanienne invariante, et de Wojtowski pour la métrique de Thompson), mais les techniques de preuve employées dans ce cadre (semigroupes de matrices symplectiques) ne s'étendent pas aux équations généralisées.

On donne dans [16] une formule explicite générale pour le taux de contraction pour la métrique de Thompson d'un flot monotone, faisant seulement intervenir le générateur du flot et sa dérivée. On a notamment appliqué ce résultat à une équation de Riccati généralisée associé à des problèmes de contrôle stochastique avec critère quadratique, dans lesquels la dynamique comporte un terme bilinéaire en le contrôle et le bruit. On a montré dans ce cas que la métrique de Thompson est la seule métrique de Finsler invariante pour laquelle le flot est nonexpansif, et l'on a caractérisé la constante de contraction locale.

Une application de ces résultats de contraction à l'analyse d'une méthode de réduction de la malédiction de la dimension, dûe à McEneaney, a été donnée dans [22] .

Une nouvelle méthode numérique maxplus, de nature randomisée, a été introduite dans [30] , elle fait apparaître de très fortes accélérations par rapport aux méthodes précédentes.

La question de l'émondage des représentations max-plus a été abordée dans [29] , où il est montré qu'une classe de relaxations convexes introduites par Sridharan et al. pour traiter numériquement un problème de contrôle quantique sont en fait exactes (pas de saut de relaxation).

English version

The max-plus methods lead to approach the value function of an optimal control or zero-sum game problem by a supremum of a finite number of quadratic forms, see in particular [126] . We are interested here in the theoretical analysis (complexity) of this class of methods, as well as of their improvement. In certain cases, the quadratic forms are propagated by the flows of generalized Riccati equations. In order to perform an error analysis, we need to use some contraction properties of the Riccati flow, for certain known metrics on the space of positive matrices, like Thompson's metric. The latter is nothing but $d_{T} (A, B) = {∥ log spec (A^{- 1} B) ∥}_{\infty}$ , where $spec$ denotes the sequence of eigenvalues of a matrix, and $log$ is understood entrywise.

This led us to study the general problem of computing the contraction rate of an order-preserving flow on a cone, with respect to Thompson's metric. Indeed, the contraction properties of the standard Riccati flow are known (theorem of Bougerol for the invariant Riemanian metric, of Wojtowski for the Thompson's metric), but the proof of these properties (based on symplectic semigroups) does not carry over to generalized Riccati equations.

We gave in [16] a general explicit formula for the contraction rate with respect to Thompson's metric of an order-preserving flow, involving only the generator of the flow and its derivative. We applied in particular this result to a generalized Riccati equation, associated to stochastic optimal control problems with a quadratic cost and a bilinear dynamics (presence of a bilinear term between the control and the noise). We showed that in this case, the Thompson's metric is the only invariant Finsler metric in which the generalized Riccati flow is nonexpansive, and we characterized the local contraction rate of this flow.

Z. Qu has applied these results in [22] to the analysis of a method of reduction of the curse of dimensionality, introduced by McEneaney.

A new max-plus numerical method, of a randomized nature, has been introduced in [30] . It shows an important speedup by by comparison with earlier methods.

The question of trimming max-plus representations was dealt with in [29] . It is shown there that a class of convex relaxations introduced Sridharan et al. to solve numerically some quantum control problem is exact.

Approximation probabiliste d'équations d'Hamilton-Jacobi-Bellman et itération sur les politiques

Participants : Marianne Akian, Eric Fodjo.

La thèse d'Eric Fodjo traite de problèmes de contrôle stochastique (de diffusions) avec critère à horizon infini actualisé ou arrêté, ou moyen en temps long, issus en particulier de problèmes de gestion de portefeuille avec coûts de transaction. La programmation dynamique conduit à une équation aux dérivées partielles d'Hamilton-Jacobi-Bellman, sur un espace de dimension au moins égale au nombre d'actifs risqués. La malédiction de la dimension ne permet pas de traiter numériquement ces équations en dimension grande (supérieure à 5). On se propose d'aborder ces problèmes avec des méthodes numériques associant itération sur les politiques, discrétisations probabilistes, et discrétisations max-plus, afin d'essayer de monter plus en dimension. Une autre piste est de remplacer l'itération sur les politiques par une approximation par des problèmes avec commutations optimales. Ces méthodes devraient aussi s'appliquer au cas de problèmes à horizon fini.

English version

The PhD thesis of Eric Fodjo concerns stochastic control problems with long term discounted or stopped payoff, or with mean-payoff in time, obtained in particular in the modelisation of portfolio selection with transaction costs. The dynamic programming method leads to a Hamilton-Jacobi-Bellman partial differential equation, on a space with a dimension at least equal to the number of risky assets. Curse of dimensionality does not allow one to solve numerically these equations for a large dimension (greater to 5). We propose to tackle these problems with numerical methods combining policy iterations, probabilistic discretisations, max-plus discretisations, in order to increase the possible dimension. Another solution is to replace policy iterations by an approximation with optimal switching problems. These methods should also be useful for finite horizon problems.

Previous |

Home | Next next